* Cifar10 scaffold (#127)
* SCAFFOLD Integration
scaffold learner
scaffold configs
initialize the scaffold terms on the client
update README to include SCAFFOLD description
add scaffold experiment
update urls
add LICENSE from NIID-Bench
update README and plots to reflect SCAFFOLD experiment
refactor scaffold computation steps into their own member functions
remove unused import
refactor to use ScaffoldLearner class
Scaffold learner depends on PyTorch. Renamed as such
use two standard aggregators, inherit SCAFFOLD workflow from standard ScatterAndGather
remove custom aggregator
use update_shareable
use multi-class inheritance for scaffold learner
use new aggregator that supports several DXOs
fix updating call for global model
simplify dxo/shareable handling
use built-in PTFileModelLocator
fix formatting
add PT Formatter
use built-in validation json generator; remove custom formatter
remove custom Json validation generator
update license
restore run_secure.sh
restore project yml
restore main branch learner executor
add SCAFFOLD link to example readme
remove custom validation json generator
print model owner info during validation
remove custom formatter
remove special handling of validation dxo. Not needed anymore
use zeros_like() to initialize scaffold terms
* use scaffold helper class
* formatting
* make scaffold function args consistent
* Add versioneer so nvflare.__version__ has meaningful value (#215)
* Add versioneer so nvflare.__version__ has meaningful value
* Fix coding style and license for versioneer
* Add information about cross site validation, commands, and hostname to documentation (#206)
* Add information about cross site validation to documentation
* More minor documentation updates
* Fix link
* Add persistence to admin cli (#210)
* Add persistence to admin cli
* save to ~/.nvflare
* 211 Update hello-monai example (#212)
* update hello-monai
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
* decrease aggregation_epochs
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
* Initial implementation of Ditto algorithm (#144)
Refactor ditto with multi inheritance
Refactor ditto with multi inheritance
Update README.md
move learners to app_common with other minor updates
update license section
update for unit test
update for unit test
update for unit test
remove redundant init parameters
Update supervised_ditto_learner.py
Update README.md
Update README.md
Update README.md
Update README.md
Update supervised_ditto_learner.py
move learners back to custom folder, will further discuss and make the move in separate PR
* Initial implementation for HA.
* Added FL client HA support.
* Enhanced the Client HA.
* HA function working for server & client.
* Working with SnapshotFilePersistor config.
* Made the overseer_agent configurable.
* Made the admin support HA.
* Adjust retry.
* Adjust admin prompt.
* Added storage_state_persistor.py
* Integrated with new overseer-agent.
* Changed to use admin Overseer-agent.
* Updated overseer_spec.py
* Fixed end line.
* Added storage and used the dev-2.1 cli.py.
* Used dev-2.1 cli.py.
Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com>
Co-authored-by: Isaac Yang <isaacy@nvidia.com>
Co-authored-by: nvkevlu <55759229+nvkevlu@users.noreply.github.com>
Co-authored-by: Sean Yang <seany314@gmail.com>
Co-authored-by: Yiheng Wang <68361391+yiheng-wang-nv@users.noreply.github.com>
Co-authored-by: Ziyue Xu <71786575+ZiyueXu77@users.noreply.github.com>
SPARK-32038 reports a regression in Apache Spark (3.0.0), in
failing to normalize NaN/Zero float values, during DISTINCT
aggregations. This causes a mismatch in results between
Apache Spark 3.0.0 on CPU, and the Rapids Accelerator (which
returns the right results).
SPARK-32038 was fixed in apache/spark#28876.
The following change introduces a conditional xfail test that passes
on Apache Spark 3.0.1 and 3.1+ (which fixes SPARK-32038),
but produces an expected failure on Spark 3.0.0.
Testing: